noise condition
Online and Differentially-Private Tensor Decomposition
Tensor decomposition is an important tool for big data analysis. In this paper, we resolve many of the key algorithmic questions regarding robustness, memory efficiency, and differential privacy of tensor decomposition. We propose simple variants of the tensor power method which enjoy these strong properties. We present the first guarantees for online tensor power method which has a linear memory requirement. Moreover, we present a noise calibrated tensor power method with efficient privacy guarantees. At the heart of all these guarantees lies a careful perturbation analysis derived in this paper which improves up on the existing results significantly.
Improved Analysis of the Accelerated Noisy Power Method with Applications to Decentralized PCA
Aguiรฉ, Pierre, Even, Mathieu, Massouliรฉ, Laurent
We analyze the Accelerated Noisy Power Method, an algorithm for Principal Component Analysis in the setting where only inexact matrix-vector products are available, which can arise for instance in decentralized PCA. While previous works have established that acceleration can improve convergence rates compared to the standard Noisy Power Method, these guarantees require overly restrictive upper bounds on the magnitude of the perturbations, limiting their practical applicability. We provide an improved analysis of this algorithm, which preserves the accelerated convergence rate under much milder conditions on the perturbations. We show that our new analysis is worst-case optimal, in the sense that the convergence rate cannot be improved, and that the noise conditions we derive cannot be relaxed without sacrificing convergence guarantees. We demonstrate the practical relevance of our results by deriving an accelerated algorithm for decentralized PCA, which has similar communication costs to non-accelerated methods. To our knowledge, this is the first decentralized algorithm for PCA with provably accelerated convergence.
Beyond Tsybakov: Model Margin Noise and $\mathcal{H}$-Consistency Bounds
We introduce a new low-noise condition for classification, the Model Margin Noise (MM noise) assumption, and derive enhanced $\mathcal{H}$-consistency bounds under this condition. MM noise is weaker than Tsybakov noise condition: it is implied by Tsybakov noise condition but can hold even when Tsybakov fails, because it depends on the discrepancy between a given hypothesis and the Bayes-classifier rather than on the intrinsic distributional minimal margin (see Figure 1 for an illustration of an explicit example). This hypothesis-dependent assumption yields enhanced $\mathcal{H}$-consistency bounds for both binary and multi-class classification. Our results extend the enhanced $\mathcal{H}$-consistency bounds of Mao, Mohri, and Zhong (2025a) with the same favorable exponents but under a weaker assumption than the Tsybakov noise condition; they interpolate smoothly between linear and square-root regimes for intermediate noise levels. We also instantiate these bounds for common surrogate loss families and provide illustrative tables.
Global-focal Adaptation with Information Separation for Noise-robust Transfer Fault Diagnosis
Ren, Junyu, Gan, Wensheng, Zhang, Guangyu, Zhong, Wei, Yu, Philip S.
Rotating machinery [1] is critical in industrial applications, where system reliability is essential to avoid financial losses and safety risks. Therefore, timely fault diagnosis is a crucial engineering priority. Deep learning-based fault diagnosis has achieved remarkable success due to its ability to extract features and model complex nonlinear relationships [2, 3]. However, industrial rotating machines operate under diverse conditions, leading to domain shifts that degrade the diagnostic performance of conventional deep learning methods [4]. Among the powerful artificial intelligence (AI) technologies, transfer learning [5] can address these limitations through cross-task knowledge transfer, where domain adaptation has become a widely adopted technique in fault diagnosis, primarily encompassing metric-based approaches, adversarial frameworks, and their hybrid variants [4, 6]. Currently, cross-domain fault diagnosis methods have been extended to encompass a wider range of diverse and practical application scenarios [7]. Given that source domain data are often more abundant in real-world settings, several studies have proposed multi-source transfer fault diagnosis approaches [8, 9]. For closed-set scenarios, various domain adaptation methods have been developed [10]. Since the label categories between source and target domains may not be completely identical, open-set domain adaptation and partial domain adaptation methods have been developed for fault diagnosis [11].
Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition
Tzeng, Jing-Tong, Busso, Carlos, Lee, Chi-Chun
Speech emotion recognition (SER) plays a critical role in building emotion-aware speech systems, but its performance degrades significantly under noisy conditions. Although speech enhancement (SE) can improve robustness, it often introduces artifacts that obscure emotional cues and adds computational overhead to the pipeline. Multi-task learning (MTL) offers an alternative by jointly optimizing SE and SER tasks. However, conventional shared-backbone models frequently suffer from gradient interference and representational conflicts between tasks. To address these challenges, we propose the Sparse Mixture-of-Experts Representation Integration Technique (Sparse MERIT), a flexible MTL framework that applies frame-wise expert routing over self-supervised speech representations. Sparse MERIT incorporates task-specific gating networks that dynamically select from a shared pool of experts for each frame, enabling parameter-efficient and task-adaptive representation learning. Experiments on the MSP-Podcast corpus show that Sparse MERIT consistently outperforms baseline models on both SER and SE tasks. Under the most challenging condition of -5 dB signal-to-noise ratio (SNR), Sparse MERIT improves SER F1-macro by an average of 12.0% over a baseline relying on a SE pre-processing strategy, and by 3.4% over a naive MTL baseline, with statistical significance on unseen noise conditions. For SE, Sparse MERIT improves segmental SNR (SSNR) by 28.2% over the SE pre-processing baseline and by 20.0% over the naive MTL baseline. These results demonstrate that Sparse MERIT provides robust and generalizable performance for both emotion recognition and enhancement tasks in noisy environments.
Optimal Convergence Rates of Deep Neural Network Classifiers
Zhang, Zihan, Shi, Lei, Zhou, Ding-Xuan
In this paper, we study the binary classification problem on $[0,1]^d$ under the Tsybakov noise condition (with exponent $s \in [0,\infty]$) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of $q+1$ vector-valued multivariate functions, where each component function is either a maximum value function or a Hรถlder-$ฮฒ$ smooth function that depends only on $d_*$ of its input variables. Notably, $d_*$ can be significantly smaller than the input dimension $d$. We prove that, under these conditions, the optimal convergence rate for the excess 0-1 risk of classifiers is $$ \left( \frac{1}{n} \right)^{\frac{ฮฒ\cdot(1\wedgeฮฒ)^q}{{\frac{d_*}{s+1}+(1+\frac{1}{s+1})\cdotฮฒ\cdot(1\wedgeฮฒ)^q}}}\;\;\;, $$ which is independent of the input dimension $d$. Additionally, we demonstrate that ReLU deep neural networks (DNNs) trained with hinge loss can achieve this optimal convergence rate up to a logarithmic factor. This result provides theoretical justification for the excellent performance of ReLU DNNs in practical classification tasks, particularly in high-dimensional settings. The technique used to establish these results extends the oracle inequality presented in our previous work. The generalized approach is of independent interest.
Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies
Lau, Matthew, Zhou, Tian-Yi, Yuan, Xiangchi, Chen, Jizhou, Lee, Wenke, Huo, Xiaoming
Anomaly detection (AD) is a critical task across domains such as cybersecurity and healthcare. In the unsupervised setting, an effective and theoretically-grounded principle is to train classifiers to distinguish normal data from (synthetic) anomalies. We extend this principle to semi-supervised AD, where training data also include a limited labeled subset of anomalies possibly present in test time. We propose a theoretically-grounded and empirically effective framework for semi-supervised AD that combines known and synthetic anomalies during training. To analyze semi-supervised AD, we introduce the first mathematical formulation of semi-supervised AD, which generalizes unsupervised AD. Here, we show that synthetic anomalies enable (i) better anomaly modeling in low-density regions and (ii) optimal convergence guarantees for neural network classifiers -- the first theoretical result for semi-supervised AD. We empirically validate our framework on five diverse benchmarks, observing consistent performance gains. These improvements also extend beyond our theoretical framework to other classification-based AD methods, validating the generalizability of the synthetic anomaly principle in AD.
Adversarial Surrogate Risk Bounds for Binary Classification
A central concern in classification is the vulnerability of machine learning models to adversarial attacks. Adversarial training is one of the most popular techniques for training robust classifiers, which involves minimizing an adversarial surrogate risk. Recent work characterized when a minimizing sequence of an adversarial surrogate risk is also a minimizing sequence of the adversarial classification risk for binary classification-- a property known as adversarial consistency . However, these results do not address the rate at which the adversarial classification risk converges to its optimal value for such a sequence of functions that minimize the adversarial surrogate. This paper provides surrogate risk bounds that quantify that convergence rate. Additionally, we derive distribution-dependent surrogate risk bounds in the standard (non-adversarial) learning setting, that may be of independent interest.